EUROO 




Automatic Selection of Suitable Sentences 
for Language Learning Exercises 


Ildiko Pilan 1 , Elena Volodina 2 , and Richard Johansson 3 


Abstract. In our study we investigated second and foreign language (L2) sentence 
readability, an area little explored so far in the case of several languages, including 
Swedish. The outcome of our research consists of two methods for sentence selection 
from native language corpora based on Natural Language Processing (NLP) and 
machine learning (ML) techniques. The two approaches have been made available 
online within Larka, an Intelligent CALL (ICALL) platform offering activities 
for language learners and students of linguistics. Such an automatic selection 
of suitable sentences can be valuable for L2 teachers during the creation of new 
teaching materials, for L2 students who look for additional self-study exercises as 
well as for lexicographers in search of example sentences to illustrate the meaning 
of a vocabulary item. Members from all these potential user groups evaluated our 
methods and found the majority of the sentences selected suitable for L2 learning 
purposes. 

Keywords: sentence readability, Swedish, NLP, ICALL, CEFR, GDEX, retrieval, 
machine learning, supervised classification, corpus-based evidence. 

1. Introduction 

Native language (LI) texts are a valuable source of authentic sentences suitable 
for the purposes of L2 learning, either as exercise items or as examples illustrating 
the meaning of a word. Before being able to use such sentences in CALL systems, 
however, we have to ensure that these examples are readable, i.e. understandable 
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by learners both lexically and structurally. Identifying these sentences manually 
would require a considerable amount of time. Instead, we propose two automatized 
selection methods which perform this task for Swedish. Both approaches have 
been integrated into the online ICALL platform Lcirka (Volodina, Borin, Loftsson, 
Arnbjornsdottir, & Leifsson, 2012) as part of a sentence readability module called 
HitEx (Hitta Exempel [Find Examples] or Hit Examples). The selection is based 
on a number of linguistic factors which were found influential for L2 readability, as 
well as principles of Good Dictionary Examples (GDEX) (Husak, 2008; Kilgarriff, 
Husak, McAdam, Rundell, & Rychly, 2008). The sentences selected by the current 
version of the system have been evaluated by L2 Swedish teachers, learners and 
linguists, who provided us with positive feedback. 

2. Materials and method 

The materials used throughout the study included Swedish native language corpora 
of various genres (novels, newspapers and blog texts) which are accessible through 
an online tool called Korp (Borin, Forsberg, & Roxendal, 2012). Korp offers 
annotations at different linguistic levels for each sentence including parts of speech 
(POS), morphosyntactic and syntactic (dependency) relations, which have all been 
exploited in our selection methods. Furthermore, we employed the scale described 
in the Common European Framework of Reference for Languages (CEFR) when 
distinguishing L2 difficulty levels. Besides native language corpora, we also 
utilized the CEFR corpus (Volodina, Pijetlovic, Pilan, & Johansson Kokkinakis, 
2013), a collection of L2 Swedish materials currently under development, and the 
Kelly-list (Volodina & Johansson Kokkinakis, 2012), a frequency-based word list 
with CEFR levels for each item. The platform Larka, besides the HitEx module 
in which our selection methods have been incorporated, also includes an exercise 
generator module (Volodina et al., 2013). 

The material described above served as basis for our two selection methods: a rule- 
based and a combined approach using rules as well as ML techniques. As a starting 
point, we used an algorithm described in Volodina, Johansson, and Johansson 
Kokkinakis (2012) based on four selection criteria. This initial set of rules was 
extended with additions from the GDEX literature (Kilgarriff et al. 2008; Husak, 
2008), as well as sentence selection research in the L2 context (Segler, 2007) 
and readability studies for LI Swedish (Heimann Miihlenbock, 2013; Sjoholm, 
2012). The ML method used in the combined approach consisted of supervised 
classification, a process in which our model learned to predict whether a sentence 
is understandable at B1 (intermediate) proficiency level or not, based on training 
examples from the CEFR corpus and native language corpora. The classification 
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algorithm employed was a Support Vector Machine (SVM) classifier which aimed 
at finding a line separating the two classes in the training data (within and above B 1 
level) based on the linguistic characteristics (features ) of each sentence. A visual 
representation of this idea is presented in Figure 1 below. 

Figure 1. Support vector machine classification 



Once trained, the SVM tried to place previously unseen sentences from LI corpora 
into the right class. The accuracy of the classifier expresses what percentage of 
these classifications were correct. 

3. HitEx: the L2 sentence readability module 

Through the graphical user interface of the FlitEx module in Larka a number of 
search criteria for the selection of sentences can be set. Figure 2 illustrates part of 
this page. 

Figure 2. The FlitEx web page 
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On the left hand side 26 selection criteria, or parameters , are listed (only part of 
these are visible in Figure 2), grouped in three categories: general, structural and 
lexical. General parameters include basic information about the sentences to select, 
namely the word to search for (keyword), its POS, the corpora from where to choose 
the examples, etc. Through structural parameters morphosyntactic preferences can 
be defined. These consist of parameters such as average sentence and word length, 
percentage of relative pronouns as well as the optional avoidance of participles and 
modal verbs. 

Finally, lexical parameters contain the avoidance of proper names, the allowed 
percentage of words above the selected CEFR level, etc. Each parameter 
value can be associated with a penalty score, determining the final ranking of 
the sentences based on how well they satisfy the search criteria. A predefined 
setting is currently available for levels Bl, B2, C1+, together with a setting for 
lexicographers (GDEX). As the presence of the two columns for the parameter 
values indicates in Figure 2, it is also possible to experiment with two different 
settings simultaneously. 

Instead of using only parameters, the ML component, which we called LaSAS 
(Latt/Las Svenska som Andra Sprak [Easy / Read Swedish as a Second Language]), 
can be selected to be used in combination with some of the parameters. LaSAS 
classifies sentences based on a large number of linguistic features such as the 
average number of senses per word, the frequency and CEFR level of words and 
aspects of syntactic complexity. Such features are based on Swedish LI readability 
studies (Heimann Miihlenbock, 2013; Sjoholm, 2012), L2 readability research for 
other languages (Francois & Fairon, 2012; Vajjala & Meurers, 2012) and CEFR 
based course book syllabuses (Levy Scherrer & Lindemalm, 2009). Currently, 
LaSAS can determine with 70% accuracy whether a sentence is understandable at 
B 1 level or not. 

Figure 3 below presents the structure of the readability module and the process 
of sentence selection. Once users provide their preferences through the dedicated 
web page in Larka, the corpus tool, Korp, searches for sentences containing the 
keyword in Swedish LI texts. In the next step, sentences undergo a selection 
with the method previously chosen by the user, which is either purely based 
on parameters or is a combination of parameters and ML classification with 
LaSAS. Finally, the resulting filtered set of sentences is displayed on the web 
page where they can be edited and downloaded to a file. The sentence selection 
methods are also available as a web service, thus they can be easily integrated 
in other applications. 
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Figure 3. The structure of the HitEx readability module 
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4. Evaluation 

To verify whether the sentences selected by our systems are suitable for L2 learning 
purposes, we carried out an evaluation with altogether 34 participants, including 
L2 Swedish teachers, students and linguists (including one lexicographer). The 
respondents had to evaluate a list of 196 sentences chosen with our two selection 
approaches. Students were required to tell us whether they understood the 
sentences, whilst teachers and linguists needed to decide whether, according to their 
judgements, B1 learners would comprehend the sentences. Altogether 73% of the 
presented items were considered understandable. There was, however, a significant 
difference among the percentages of understandable examples according to the 
subgroup of respondents. Figure 4 below shows this discrepancy. 

Teachers were considerably stricter than linguists when judging understandability, 
regarding 17% fewer sentences acceptable. The first subgroup of learners (adults 
with university-level education) understood 10% more sentences than students 
above 16 years with mixed educational background ( Students 2) and 34% more 
than 15-year-old high-school students ( Students ^ ). Learners understood overall 
69% of the examples, 4% more than teachers predicted. 
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Besides the aspect of understandability, teachers and linguists were also asked to 
decide whether the sentences would be suitable as exercise items or as examples 
for vocabulary illustration. About six out of ten sentences corresponded to these 
criteria. For all three aspects investigated, the purely rule-based approach was 
slightly preferred (by 3%) to the combined method. 

Figure 4. Percentage of understandable sentences per respondent subgroup 
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During the evaluation, qualitative data has also been collected, which consisted of 
respondents’ comments about difficult or undesirable elements in the sentences. 
These included, for example, atypical word order, subordinates and the presence 
of infrequent idioms. Moreover, the lack of sufficient amount of context, informal 
spelling and a preference for illustrating the most frequent usage of a word have 
also been mentioned. 

5. Conclusions 

We proposed two methods for the selection of sentences from native language 
corpora which are suitable for L2 learning purposes. According to the results of 
an empirical evaluation, the approach based only on parameters was somewhat 
more successful than the one combining rules and ML techniques. The results are 
encouraging, about 70% of the sentences proved to be of an appropriate level of 
difficulty. About 10% less were suitable as exercise items and example sentences 
for vocabulary item illustration. The selection methods found their practical 
application in an ICALL platform in exercise generation and they are also available 
as a web service. In the future, we intend to extend the selection to all CEFR levels 
and we also plan to refine the methods further in attempt to improve the suitability 
of the sentences chosen. 
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